BINNED_STATISTIC
Overview
The BINNED_STATISTIC function computes summary statistics for data grouped into bins along a numeric axis. This is a generalization of a histogram: while a histogram counts the number of observations in each bin, BINNED_STATISTIC allows computation of mean, sum, median, standard deviation, minimum, maximum, or count for values falling within each bin.
This implementation wraps scipy.stats.binned_statistic from the SciPy library. The function takes two parallel arrays: an independent variable x that determines bin assignment, and a dependent variable values on which the statistic is computed. Data points are assigned to bins based on their x values, and the selected statistic is calculated from the corresponding values.
The binning process divides the range of x into equal-width intervals. By default, the range spans from the minimum to maximum values in x, but custom bounds can be specified using the lowerlimit and upperlimit parameters. Values of x outside the specified range are ignored. All bins except the rightmost are half-open intervals [a, b), meaning they include the left edge but exclude the right edge. The rightmost bin [a, b] includes both edges.
Supported statistics include:
- mean: arithmetic mean of values in each bin (empty bins return NaN)
- sum: sum of values, equivalent to a weighted histogram
- median: median value in each bin
- count: number of observations, equivalent to an unweighted histogram
- std: standard deviation (calculated with \text{ddof}=0)
- min / max: minimum or maximum value in each bin
For more information, see the SciPy binned_statistic documentation and the SciPy GitHub repository.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=BINNED_STATISTIC(x, values, bs_statistic, bins, lowerlimit, upperlimit)
x(list[list], required): Data to bin (independent variable).values(list[list], required): Data to compute statistics on (dependent variable).bs_statistic(str, optional, default: “mean”): Statistic to compute.bins(int, optional, default: 10): Number of bins.lowerlimit(float, optional, default: null): Lower bound for the bin range.upperlimit(float, optional, default: null): Upper bound for the bin range.
Returns (list[list]): 2D list of bin statistics, or error message string.
Examples
Example 1: Mean statistic in 2 bins
Inputs:
| x | values | bs_statistic | bins |
|---|---|---|---|
| 1 | 1 | mean | 2 |
| 1 | 1 | ||
| 2 | 2 | ||
| 5 | 1.5 | ||
| 7 | 3 |
Excel formula:
=BINNED_STATISTIC({1;1;2;5;7}, {1;1;2;1.5;3}, "mean", 2)
Expected output:
| Result |
|---|
| 1.3333 |
| 2.25 |
Example 2: Sum statistic in 2 bins
Inputs:
| x | values | bs_statistic | bins |
|---|---|---|---|
| 1 | 1 | sum | 2 |
| 1 | 1 | ||
| 2 | 2 | ||
| 5 | 1.5 | ||
| 7 | 3 |
Excel formula:
=BINNED_STATISTIC({1;1;2;5;7}, {1;1;2;1.5;3}, "sum", 2)
Expected output:
| Result |
|---|
| 4 |
| 4.5 |
Example 3: Median with custom range
Inputs:
| x | values | bs_statistic | bins | lowerlimit | upperlimit |
|---|---|---|---|---|---|
| 1 | 10 | median | 3 | 1 | 7 |
| 2 | 20 | ||||
| 3 | 30 | ||||
| 4 | 40 | ||||
| 5 | 50 | ||||
| 6 | 60 | ||||
| 7 | 70 |
Excel formula:
=BINNED_STATISTIC({1;2;3;4;5;6;7}, {10;20;30;40;50;60;70}, "median", 3, 1, 7)
Expected output:
| Result |
|---|
| 15 |
| 35 |
| 60 |
Example 4: Count statistic in 2 bins
Inputs:
| x | values | bs_statistic | bins |
|---|---|---|---|
| 1 | 10 | count | 2 |
| 2 | 20 | ||
| 3 | 30 | ||
| 4 | 40 | ||
| 5 | 50 |
Excel formula:
=BINNED_STATISTIC({1;2;3;4;5}, {10;20;30;40;50}, "count", 2)
Expected output:
| Result |
|---|
| 2 |
| 3 |
Python Code
from scipy.stats import binned_statistic as scipy_binned_statistic
import math
def binned_statistic(x, values, bs_statistic='mean', bins=10, lowerlimit=None, upperlimit=None):
"""
Computes a binned statistic (mean, sum, median, etc.) for the input data.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html
This example function is provided as-is without any representation of accuracy.
Args:
x (list[list]): Data to bin (independent variable).
values (list[list]): Data to compute statistics on (dependent variable).
bs_statistic (str, optional): Statistic to compute. Valid options: Mean, Sum, Median, Count, Std, Min, Max. Default is 'mean'.
bins (int, optional): Number of bins. Default is 10.
lowerlimit (float, optional): Lower bound for the bin range. Default is None.
upperlimit (float, optional): Upper bound for the bin range. Default is None.
Returns:
list[list]: 2D list of bin statistics, or error message string.
"""
def to2d(val):
return [[val]] if not isinstance(val, list) else val
# Convert scalars to 2D lists if needed
x = to2d(x)
values = to2d(values)
# Flatten 2D lists to 1D
try:
flat_x = [float(item) for row in x for item in (row if isinstance(row, list) else [row])]
flat_values = [float(item) for row in values for item in (row if isinstance(row, list) else [row])]
except Exception:
return "Invalid input: x and values must be 2D lists or scalars of numbers."
if len(flat_x) == 0 or len(flat_values) == 0:
return "Invalid input: x and values must not be empty."
if len(flat_x) != len(flat_values):
return "Invalid input: x and values must have the same length."
if bs_statistic not in ('mean', 'sum', 'median', 'count', 'std', 'min', 'max'):
return "Invalid input: bs_statistic must be one of 'mean', 'sum', 'median', 'count', 'std', 'min', 'max'."
try:
nbins = int(bins)
except Exception:
return "Invalid input: bins must be an integer."
if nbins < 1:
return "Invalid input: bins must be >= 1."
# Prepare range
range_arg = None
if lowerlimit is not None and upperlimit is not None:
try:
range_arg = (float(lowerlimit), float(upperlimit))
except Exception:
return "Invalid input: lowerlimit and upperlimit must be numbers."
try:
if range_arg:
res = scipy_binned_statistic(flat_x, flat_values, statistic=bs_statistic, bins=nbins, range=range_arg)
else:
res = scipy_binned_statistic(flat_x, flat_values, statistic=bs_statistic, bins=nbins)
stat = res.statistic
except Exception as e:
return f"scipy.stats.binned_statistic error: {e}"
# Return as 2D list (column vector), converting NaN to None for Excel compatibility
result = []
for val in stat:
if val is None or (isinstance(val, float) and math.isnan(val)):
result.append([""])
else:
result.append([float(val)])
return result